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Abstract 

We propose definitions o/QAC°, the quantum analog of the classical class AC° of constant- depth 
circuits with AND and OR gates of arbitrary fan-in, and QACC[g], the analog of the class ACC[g] 
where Modg gates are also allowed. We prove that parity or fanout allows us to construct quantum 
MODq gates in constant depth for any q, so QACC[2] = QACC. More generally, we show that 
for any q,p > 1, MODg is equivalent to MODp (up to constant depth). This implies that QAC^ 
with unbounded fanout gates, denoted QAC^f, is the same as QACC[g] and QACC for all q. Since 
ACC[p] 7^ ACC[g] whenever p and q are distinct primes, QACC[g] is strictly more powerful than its 
classical counterpart, as is QAC" when fanout is allowed. This adds to the growing list of quantum 
complexity classes which are provably more powerful than their classical counterparts. 
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We also develop techniques for proving upper bounds for QACC in terms of related language 
classes. We define classes of languages EQACC, NQACC and BQACCq. We define a notion of 
log-planar QACC operators and show the appropriately restricted versions of EQACC and NQACC 
are contained in P/poly. We also define a notion of log-gate restricted QACC operators and show 
the appropriately restricted versions 0/ EQACC and NQACC are contained in TC°. 



1 Introduction 

Advances in quantum computation in the last decade have been among the most notable in 
theoretical computer science. This is due to the surprising improvements in the efficiency of 
solving several fundamental combinatorial problems using quantum mechanical methods in place 
of their classical counterparts. These advances led to considerable efforts in finding new efficient 
quantum algorithms for classical problems and in developing a complexity theory of quantum 
computation. 

While most of the original results in quantum computation were developed using quantum 
Turing machines, they can also be formulated in terms of quantum circuits, which yield a more 
natural model of quantum computation. For example, Shor has shown that quantum circuits 
can factor integers more efficiently than any known classical algorithm for factoring. And quantum 
circuits have been shown (see Yao |3^) to provide a universal model for quantum computation. 

The theory of circuit complexity has long been an important branch of theoretical computer 
science. Shallow circuits correspond to parallel algorithms that can be performed in small amounts 
of time on a massively parallel computer with constant communication delays, and so circuit 
complexity can be thought of as a study of how to solve problems in parallel. In addition, some 
low-lying circuit classes have beautiful algebraic characterizations, e.g. ^ p!7| . 

In |]18|, |19[, Moore and Nilsson suggested a definition of QNC, the quantum analog of the class 
NC of problems solvable by circuits with polylogarithmic depth and polynomial size Here, 



we will study quantum versions of some additional circuit classes. Recall the following definitions: 

1. NC'^ consists of problems solvable by families of circuits of AND, OR, and NOT gates with 
depth (9(log^n) and size polynomial in ra, where n is the size of the input, and where the 
AND and OR gates have just two inputs each. 

2. AC'^ is like NC'^, but where we allow AND and OR gates with unbounded fan-in, i.e. arbitrary 
numbers of inputs, in each layer of the circuit. 

3. ACC'^lq] is like AC'^, but where we also allow Mod, gates with unbounded fan-in, where 
Modq(a;i, . . . , outputs 1 iff the sum of the inputs is not a multiple of q. 

4. ACC'^ = U,ACC^[g]. 

5. NC = Ufc NC'^ = Ufc AC'^ = Ufc ACC^ 
Then we have 



AC° c ACC°[2] c ACC° C NC(^) C ■ ■ ■ C NC 
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In fact, these first two inclusions are known to be proper 0, 13, 22, 30 1. Neither MAJORITY nor 
Parity are in AC°, while the latter is trivially in ACC°[2]. In addition, ACC°[p] and ACC°[g] are 
known to be incomparable whenever p and q are distinct primes. Thus these classes give us some 
of the few strict inclusions known in computational complexity theory. However, for all anyone 
knows, ACC°[6] could contain PP, NP, and the entire polynomial hierarchy! 

Quantum analogs of AC'' and ACC are defined and studied here. One central class that we 
examine is a quantum analog of AC° that we denote QAC^f. QAC°£ is the class of families of 
operators which can be built out of products of constantly many layers consisting of polynomial- 
sized tensor products of one-qubit gates (analogous to NOT's), Toffoli gates (analogous to AND's 
and OR's) and fan-out gates. The subscript "w/" in the notation denotes "with fan-out." The 
idea of fan-out in the quantum setting is subtle, as is made clear in Section § of this paper. 
The sub-class of QAC°f that does not include fan-out gates is denoted simply QAC°. An analog 
of ACC[g] (i.e., ACC circuit families only allowing Modg gates) is QACC[g], defined similarly to 
QAC°f, but replacing the fan-out gates with quantum Modg gates (which we denote as MOD,). 
The class QACC is U5QACC[g]. 

In this paper, we prove a number of results about QAC and QACC, and address some definitional 
difficulties. We show that an ability to form a "cat state" with n qubits, or fan out a qubit into n 
copies in constant depth, is equivalent to being able to construct an ra-ary parity gate in constant 
depth. We discuss how best to compare these circuit classes to classical ones. 

We prove the surprising result that, for any integer q > 1, QAC°f = QACC[g] = QACC. This 
is in sharp contrast to the classical result of Smolensky that says ACC°[g] ^ ACC°[p] for any 
pair of distinct primes q,p, which implies that for any prime p, AC" C ACC^[p] C ACC. This result 
shows that parity gates are as powerful as any other mod gates in QACC, and more generally, 
that any MOD^ gate is as good as any other, up to polynomial size and constant depth. Thus we 
conclude that QAC°£, or, for any g, QACC[g], is strictly more powerful than ACC[g] and AC°. 

We also develop methods for proving upper bounds for QACC. The definition of QACC imme- 
diately leads to a problem in this regard: QACC is a class of operators that only have a natural 
interpretation quantum mechanically. In order to clarify the relationship with classical computa- 
tion we assign properties to QACC circuits based on measurements we can perform on them. In 
particular, we define several natural languages classes related to QACC. These language classes 
arise from considering a quantum circuit family in the class and specifying a condition on the 
expectation of observing a particular state after applying a circuit from the family to an input 
state. The condition might simply be that the expectation is non-zero, or that it is bounded 
away from zero by some constant, or that it is exactly equal to some constant. We call the 
language classes obtained by these conditions on the expectation NQACC, BQACC and EQACC, 
respectively. For example, the class NQACC corresponds to the case where x is in the language 
if the expectation of the observed state after applying the QACC operator is non-zero. This is 
analogous to the definition of the class NQP as defined in Adleman et al. [Q] and discussed in 
Fenner et al. [0 . In this way we obtain natural classes of languages which correspond to those 
defined classically by families of small depth circuits. In these terms, for example, we can more 
succinctly and precisely express the statement "QACC[g] is strictly more powerful than ACC[g]" 
by writing ACC[g] C EQACC [g]. 

We desire upper bounds showing that these language classes are contained in classically defined 
circuit classes, thus delimiting the power of these quantum computations. In particular, we 
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believe that the languages arising in this way from our definitions are contained within TC°, those 
problems computed by constant-depth threshold circuits. We have been unable to verify this, 
and in fact the only classical upper bound for these language classes that we know of is the very 
powerful counting class coC=P (see fl^). We do give some evidence for this proposed TC° upper 
bound here and further provide some techniques which may prove useful in solving this problem. 

Our methods result in upper bounds for restricted QACC circuits. Roughly speaking, we show 
that QACC is no more powerful than P/poly provided that a layer of "wire-crossings" in the QACC 
operator can be written as log many compositions of Kronecker products of controUed-not gates. 
We call this class QACCj,"^, where the "pi" is for this planarity condition. We show if one further 
restricts attention to the case where the number of multi-line gates (gates whose input is more 
than 1 qubit) is log-bounded then the circuits are no more powerful than TC°. We call this class 
QACCg°Jg^. These results hold for arbitrary complex amplitudes in the QACC circuits. 

In terms of our language classes, we show that NQACCg°^^g^ is in TC° and NQACCp°^ is in P/poly. 
Although the proof uses some of the techniques developed by Fenner, Green, Homer and Pruim 



and by Yamakami and Yao to show that NQP^ = coC=P, the small depth circuit case 



presents technical challenges not present in their setting. In particular, given a QACC operator 
built out of layers Mi,...,Mt and an input state |x,0^^"-'), we must show that a TC'^ circuit 
can keep track of the amplitudes of each possible resulting state as each layer is applied. After 
all layers have been applied, the TC'' circuit then needs to be able to check that the amplitude 
of one possible state is non-zero. Unfortunately, there could be exponentially many states with 
non-zero amplitudes after applying a layer. To handle this problem we introduce the idea of a 
"tensor-graph," a new way to represent a collection of states. We can extract from these graphs 
(via TC" or P/poly computations) whether the amplitude of any particular vector is non-zero. 

The exponential growth in the number of states is one of the primary obstacles to proving 
that all of NQACC is in TC° (or even P/poly), and thus the tensor graph formalism represents a 
significant step towards such an upper bound. The reason the bounds apply only in the restricted 
cases is that although tensor graphs can represent any QACC operator, in the case of operators 
with layers that might do arbitrary permutations, the top-down approach we use to compute a 
desired amplitude from the graph no longer seems to work. We feel that it is likely that the 
amplitude of any vector in a tensor graph can be written as a polynomial product of a polynomial 
sum in some extension algebra of the ones we work with in this paper, in which case it is quite 
likely it can be evaluated in TC°. 

Another important obstacle to obtaining a TC° upper bound is that one needs to be able 
to add and multiply a polynomial number of complex amplitudes that may appear in a QACC 
computation. We solve this problem. It reduces to adding and multiplying polynomially many 
elements of a certain transcendental extension of the rational numbers. We show that in fact TC° 
is closed under iterated addition and multiplication of such numbers (Lemma |5.1| below). This 
result is of independent interest, and our application of tensor-graphs and these closure properties 
of TC° may prove useful in further investigations of small-depth quantum circuits. 

We now discuss the organization of the rest of this paper. Section 2 contains definitions for the 
quantum operator classes we will be considering as well as other background definitions. Section 3 
shows the constant-depth quantum circuit equivalence of fan-out and parity gates. Section 4 
establishes for arbitrary p and q the constant-depth quantum equivalence of Modp and Modg. 
Section 5 contains our upper bound results. Finally, the last section has a conclusion and some 
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open problems. 

Preliminary versions of these results appeared in [T^ and 
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2 Preliminaries 

In this section we define the gates used as building blocks for our quantum circuits. Classes 
of operators built out of these gates are then defined. We define language classes that can be 
determined by these operators and give a couple of definitions from algebra. Lastly, some closure 
properties of TC° are described. 

Definition 2.1 We define various quantum gates as follows: 

• By a one-qubit gate we mean an operator from the group U(2). 

• LetU = ( ) G f/(2). A„(f/) IS defined as: Ao{U) = U and for m > 0, A„(f/) is 

A rmn^r ,M - iuyo\x,0) +Uyi\x,l) if Af^^Xk = 1 
/\^[U)[\x,y)) - <^^^^^^ oi/iermse 

• Let X = ax = (^^^ ■ ^ Toffoli gate is a Am{X) gate for some m > 0. A controlled-not 
gate is a Ai(X) gate. 

• The Hadamard gate is the one-qubit gate H = ^ ^l) ' 

• An (m-)spaced controlled-not gate is an operator that maps \yi, . . . , ym, x) to |a; © t/i; ?/2 • • • , ?/m, x) 

or |X, 1/1, ... , yra) to |X, ?/i . . . , Vm-l, ym © x) 

• An (m-ary) fan out gate F is an operator that maps . . . , ym, x) to \x ® yi, ■ ■ ■ ,x ® ym, x) . 

• The classical Boolean Modg-function on n bits is defined so that Modq(a;i, . . . , x„) = 1 iff 
Yl^^iXi ^ mod q. We also define Modq^r{xi, to output 1 iffJ27=i^i = ^ ^od q. A 
quantum MOD^ gate is an operator that maps \yi, . . . , ym, x) to \yi, . . . , ym, x © Modg(|/i, . . . , ym)) ■ 
A quantum MOD,,^ gate maps \yi, . . . ,ym,x) to \yi, . . .,ym,x® Modq^rivi, • • • , Vm)) ■ We 
write -iMODg for MODg^o- A parity gate is a MOD2 gate. 

Note that, since negation is built into the output (via the exclusive OR), it is easy to simulate 
negations using MODg^^ gates (unlike the classical case). For example, by setting 6 = 1, we can 
compute -iModg^r-- More generally, using one work bit, it is possible to simulate "-iMODg^r," 
defined so that, 

\Xi, Xn, h) H-> \Xi, ...,Xn,b® (^Modg,^(Xi, ...,Xn))) 

using just MODg r and a controlled-not gate. Thus MOD^ and -iMOD^ are equivalent up to 
constant depth. Finally, observe that MOD";'; = MOD^ ,.- 
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Figure 1. Our notation for n-ary Toffoli, controlled-f/, and MOD^ gates, fanout gates, symmetric 
phase shift gates, and the l-iadamard gate. On the top right, we show a useful identity between 
the controlled-not, the controlled 7r-shift, and the Hadamard gate. On the bottom right we show a 
controlled-f/ gate with one of its inputs negated by conjugation with X. 



We will use the notation in Figure |l| for our various gates. 

As discussed in further detail in section |^ below, the no-cloning theorem of quantum mechanics 
makes it difficult to directly fan out qubits in constant depth (although constant fan-out in 
constant depth is no problem, since we can make multiple copies of the inputs). Thus it is necessary 
to define the operator F as in the above definition. Also, in the literature it is frequently the case 
that one says a given operator M on . . . , Um) can be written as a tensor product of certain 
gates Mj. What is meant is that there is an permutation operator 11 ( a map from . . . , ym) to 
\y-K{i), • • • , yn(m)) for some permutation tt) such that 

M|yi, . . . y„) = n ^] M,n-i|yi, . . . y.^) 

where the M^-'s are our base gates, i.e. those gates for which no inherent ordering on the yi is 
assumed a priori, and ® is the Kronecker product, which fiattens a tensor product into a matrix 
with blocks indexed in a particular way. Since it is important to keep track of such details 
in our upper bounds proofs, we will always use Kronecker products of the form ®"Mj without 
unspoken permutations. Nevertheless, being able to do permutation operators (not conjugation 
by a permutation) intuitively allows our circuits to simulate classical wire crossings. To handle 
permutations, we allow our circuits to have controlled-not layers. A controlled-not layer is a 
gate which performs, in one step, controlled-not 's between an arbitrary collection of disjoint pairs 
of lines in its domain. That is, it performs 11 0" Ai(X)n~^ for some permutation operator 11. 
It is easy to see |18[ that any permutation can be written as a product of a constant number 
of controlled-not layers. We say a controlled-not layer is log-depth if it can be written as the 
composition of log many matrices each of which is the Kronecker product of identities and spaced 
controlled-not gates. 

M®" is the n-fold Kronecker product of M with itself. 

Definition 2.2 

QAC'^ is the class of families {Fn}, where F„ is in f/(2"+P(")), p a polynomial, and each Fn is 
writable as a product of 0{\og^ n) layers, where a layer is a Kronecker product of one-qubit gates 
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and Toffoli gates or is a controlled-not layer. Also for all n the number of distinct types of one 
qubit gates used must be fixed. 

QACC''[g] IS the same as QAC^ except we also allow MODg gates. QACC'' = UgQACC''[g]. 
QAC^j is the same as QAC'^ but we also allow fan-out gates. 

QACC is defined as QACC° and QACC[g] is defined as QACC°[g]. QACCp°^ is QACC restricted to 
log-depth controlled not layers. QACC^g^j-^g is QACC restricted so that the total number of multi-line 
gates in all layers is log-bounded. 

IfC is one of the above classes and K O C, then Ck are the families in C with coefficients restricted 
to K. 

Let {Fn\ and {Gn}, Gn,Fn G [7(2") be families of operators. We say {Fn} is QAC° reducible 
to {Gn} if there is a family {Rn}, Rn G t/(2"+P*^")) 0/ QAC° operators augmented with operators 
from {Gn} such that for all n, x, y G {0, 1}", there is a setting of zi, Zp(^n} G {0, 1} for which 
(y|Fji|x) = (y, z|i?„|x, z) . Operator families are QAC° equivalent if they are QAC^ reducible to 
each other. If Ci and C2 are families of QAC° equivalent operators, we write Ci = C2 . 

We refer to the Zi^s above as "work bits" (also called "ancillae" in |jl8|). Note that in proving 
QAC° equivalence, the work bits must be returned to their original values in a computation so 
that they are disentangled from the rest of the circuit, and can be re-used by subsequent layers. 

It follows for any {Fn} G QAC° that F„ is writable as a product of finite number of layers. In 



an earlier paper, Moore |16| places no restriction on the number of distinct types of one-qubit 
gates used in a given family of operators. Here we restrict these so that the number of distinct 
amplitudes which appear in matrices in a layer is fixed with respect to n. This restriction arises 



implicitly in the quantum Turing machine case of the upper bounds proofs in Fenner, et al. |T^ 



and Yamakami and Yao |3T|. Also, it seems fairly natural since in the classical case one builds 



circuits using a fixed number of distinct gate types. Our classes here are, thus, more "uniform'' 



than those defined earlier |]T6[. We now define language classes based on our classes of operator 
families. 

Definition 2.3 Let C be a class of families of U {2'^^^^'^^) operators where p is a polynomial and 
n = \x\. 

1. E-C is the class of languages L such that for some {Fn} G C and{{zn\} = {{zn,i, • • • , Zn,n+p{n)\} 
a family of states, m := | 0^*^"-*) p is 1 or and x G L iff m = 1. 

2. N-C is the class of languages L such that for some {Fn} G C and {{zn\} a family of states, 
xgLz#|(z„|F„|x,Op("))P>0. 

3. B-C is the class of languages L where for {Fn} G C and {{z\}, x E L if | 0^*^""^) ^ > 
3/4 andx^L 0p("))|2 < 1/4 . 

It follows E-C C N-C and E-C C B-C. We frequently will omit the '-' when writing a class, so 
E-QACC is written as EQACC. Let |^) := F„|x,Op(")). Notice that |(4|i^nk, 0p("))|2 = (^|P|^-„) |^), 
where P\zn) is the projection matrix onto We could allow in our definitions measurements of 
up to polynomially many such projection observables and not affect our results below. However, 
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this would shift the burden of the computation in some sense away from the QACC operator and 
instead onto preparation of the observable. 

Next are some variations on familiar definitions from algebra. 

Definition 2.4 Let k > 0. A subset {Pi}i<i<k of C is linearly independent ifJ2i=iaiPi ^ 
for any (ai, . . . , Ofc) G Q'^ — {O'^}. A set {/5j}i<j<fc is algebraically independent if the only p G 
Q[xi, . . . , Xfc] with p{Pi, . . . , Pk) = is the zero polynomial. 

We now briefly mention some closure properties of TC*^ computable functions that are useful in 
proving NQACC|^°^^(g3 C TC°. For proofs of the statements in the next lemma see p8|, ^ |10 . 



Lemma 2.5 (1) TC^ functions are closed under composition. (2) The following are TC*^ com- 
putable: X + y, X — y := x — y if x — y > and otherwise, \x\ := [log2(x + 1)], x ■ y, 
[x/y\, 2™™''*''^^'^'^ and cond(x,y^ z) := y if x > and z otherwise. (3) If f{i,x) is TC° com- 
putable then Elif f{k,x), Ulif f{k,x), < p{\x\){f{t, x) = 0), 3z < p{\x\){f{t,x) = 0), and 
^i<p(|a;|)(/(z, x) = 0) := the least i such that f{i,x) = and i < p{\x\) or p{x) + 1 otherwise, are 
TC° computable. 

We drop the min from the 2™™*^*'^*^!^'^^ when it is obvious a suitably large can be found. We 

define max{x, y) := cond{l {y ^ x)), x, y) and define 

maxi<p(^:,\){f{i)) := /ii<p(|x|)(Vj < p{\x\){f{j) - /(i) = 0) 

Using the above functions we describe a way to do sequence coding in TC°. Let P\t\ix, w) := [{w-^ 
[w/2'^^"'"^)l*lj ■ 2(^'+-'^)l*l)/2'*l*IJ . The function is useful for block coding. Roughly, first gets rid 
of the bits after the (x+l)|t|th bit then chops off the low order x\t\ bits. Let B = 2^^^^^^''^''^, so that 
B is longer than either x or y. Hence, we code pairs as {x, y) := {B+y)-2B + B+x, and projections 
as {w)i := /3Li|^|j_i(0,/3Li|^|j(0,w)) and {w)2 ■= /5l1|^|j^i(0, /^^ii^u (1, w)). We can encode a poly- 
length, TC° computable sequence of numbers (/(I), . . . , f{k)) as the pair (Ef(/(02*'"'),m) where 
m := |/(maxi(/(z)))| + 1. We then define the function which projects out the zth member of a 
sequence as P{i,w) := /5(^„)2(z, if ). 

We can code integers using the positive natural numbers by letting the negative integers be 
the odd natural numbers and the positive integers be the even natural numbers. TC*^ can use the 
TC'' circuits for natural numbers to compute both the polynomial sum and polynomial product 
of a sequence of TC° definable integers. It can also compute the rounded quotient of two such 
integers. For instance, to do a polynomial sum of integers, compute the natural number which is 
the sum of the positive numbers in the sum using cond and our natural number iterated addition 
circuit. Then compute the natural number which is the sum of the negative numbers in the sum. 
Use the subtraction circuit to subtract the smaller from the larger number and multiply by two. 
One is then added if the number should be negative. For products, we compute the product of 
the natural numbers which results by dividing each integer code by two and rounding down. We 
multiply the result by two. We then sum the number of terms in our product which were negative 
integers. If this number is odd we add one to the product we just calculated. Finally, division 
can be computed using the Taylor expansion of 1/x. 
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Figure 2. Two ways to make a cat state on n qubits. The circuit on the left uses only two-qubit 
gates and has depth \ogn. On the right, we define a "fanout gate" that simultaneously performs n 
controlled-nots from one input qubit. 



3 Fanout, Cat States, and Parity 

To make a shallow parallel circuit, it is often important to fan out one of the inputs into multiple 
copies. One of the differences between classical circuits and quantum ones as we have defined them 
here is that in classical circuits, we usually assume that we get arbitrary fanout for free, simply by 
splitting a wire into as many copies as we like. This is difficult in quantum circuits, since making 
an unentangled copy requires non-unitary, and in fact non-linear, processes: 

(a|0) + f3\l)) ® (a|0) + = a^\00) + a/3(|01) + |10)) + 

has coefficients quadratic in a and (3, so it cannot be derived from a|0) -|- using any linear 
operator, let alone a unitary one. This is one form of the so-called "no cloning" theorem. 

However, the controlled-not gate can be used to copy a qubit onto a work bit in the pure state 
|0) by making a non-destructive measurement: 

(a|0) ® |0) ^ «|00) 

Note that the final state is not a tensor product of two independent qubits, since the two qubits 
are completely entangled. This means that whatever we do to one copy, we do to the other. 
Except when the states are purely Boolean, we have to treat this kind of "fanout" more gingerly 
than we would in the classical case. 

By making n copies of a qubit in this sense, we can make a "cat state" a|000 ■■■0)+/3|lll---l). 
Such states are useful in making quantum computation fault-tolerant (e.g. |2^)- We can do 
this in log n depth with controlled-not gates, as shown on the left-hand side of Figure 0. When 
preceded by a Hadamard gate on the top qubit, this circuit will map an initial state |0000) onto 
a cat state -^(lOOOO) + |1111)). However, we will also consider circuits which can do this in a 
single layer, with a "fanout gate" that simultaneously copies a qubit onto n target qubits. This 
is simply the product of n controlled-not gates, as shown on the right-hand side of Figure |^. 

We now show that in quantum circuits, we can do fanout in constant depth if and only if we 
can construct a parity gate in constant depth. 

Proposition 3.1 In any class of quantum circuits that includes Hadamard and controlled-not 
gates, the following are equivalent: 
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Figure 3. The parity and fanout gates are conjugates of each other by a layer of l-ladamard gates. 

1. It is possible to map a\Q) + (3\1) and n — 1 work bits in the state |0) onto an n-qubit cat state 
q;|000 ■ • ■ 0) + ■ • ■ 1) in constant depth. 

2. The n-ary fanout gate on the right-hand side of Figure ^ can be implemented in constant 
depth with at most n — 1 additional work bits. 

3. An n-ary parity or MOD2 gate as defined above can be implemented in constant depth with 
at most n — 1 additional work bits. 



Proof. First, note that (1) is a priori weaker than (2), since (1) only requires that an operator 
map 1 100 ■ ■ ■ 0) to |111 ■ ■ ■ 1) and |000 ■ ■ ■ 0) to itself. In fact, the two circuits shown in Figure |^ 
both do this, even though they differ on other initial states. 

To prove (2 <^=^ 3), we simply need to notice that the parity gate is a fanout gate going the other 
way conjugated by a layer of Hadamard gates, since parity is simply a product of controlled-nots 
with the same target qubit, and conjugating with H reverses the direction of a controlled- not. 
This is shown in Figure ^. Clearly the number of work bits used to perform either gate will be 
the same. (We prove this equivalence in greater detail and generality in Proposition below.) 

To prove (1^3), we use a slightly more elaborate circuit shown in Figure ^. Here we use the 
identity shown in Figure |I] to convert the parity gate into a product of controlled vr-shifts. Since 
these are diagonal, they can be parallelized as in by copying the target qubit onto n — 1 work 
bits, and applying each one to a different copy. While we have drawn the circuit with two fanout 
gates, any gate that satisfies the conditions in (1), and its inverse after the vr-shifts, will do. 

Finally, (2^1) is obvious. q 

This brings up an interesting issue. It is not clear that a QAC° operator as we have defined QAC° 
here can "simulate" any AC^ circuit, since we are not allowing arbitrary fanout in each layer. An 
alternate definition, which we might call QAC with fanout or QAC^f, would allow us to perform 
controlled-f/ gates or Toffoli gates in the same layer whenever they have different target qubits, 
even if their input qubits overlap. This seems reasonable, since these gates commute. Since we 
can fan out to n copies in logn layers as in Figure |^, we have QAC'^ C QAC^f C QAC*^'^'*'^''. We can 
define QACC„f in the same way, and Proposition 13 implies that QAC^f = QACCtf[2] = QACC'^[2]. 

It is partly a matter of taste whether QAC° or QAC^f is a better analog of AC°. However, 
fanout does seem possible in several proposed technologies for quantum computing. In an ion 
trap computer [||, vibrational modes can couple with all the atoms simultaneously, so we could 
apply a controlled-not from one atom to the "bus qubit" and then from the bus to the other n 
atoms. In bulk-spin NMR [Q, we can activate the couplings from one atom to n others, and 
perform n controlled vr-shifts simultaneously, which is equivalent to fanout with the target qubits 
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Figure 4. The parity gate can also be written as a product of controlled 7r-shifts, with the target qubit 
conjugated by H. Since these are diagonal, we can parallelize them using any gate that can make a 
cat state . 



conjugated with the Hadamard gate. Thus allowing fanout may in fact be the most reasonable 
model of constant-depth quantum circuits. 

4 Constant Depth Equivalence of MOD^ and MOD^^ Gates 

As stated in the Introduction, in the classical case Modp and Modg gates are not easy to build 
from each other whenever p and q are relatively prime. In fact, to do it in constant depth requires 
a circuit of exponential size [^. In this section, we will show this is not true in the quantum 
case. Specifically, we show that any MODg gate can be built in constant depth from any MODp 
gate, for any two numbers p and q. We start by showing that any MOD^ gate can be built from 
parity gates in constant depth. 

Proposition 4.1 In any circuit class containing n-ary parity gates and one-qubit gates, we can 
construct an n-ary MODg gate, with 0{n \ogq) work hits, in depth depending only on q. 

Proof. Let k = [logg q] , and let M be a Boolean matrix on k qubits where the zero state has 
period q. For instance, if we write \x) as shorthand for \xk-i ■ ■ -xiXq) where Xi is the 2* digit of 
x's binary expansion and < x < 2^^, we can define M so that it permutes the \x) as follows: 



M\x) 



|(x+l)modg) if x < g 
\x) if X > q 



Then if we start with k work bits in the state |0) and apply a controUed-M gate to them from 
each input, the state will differ from |0) on at least one qubit if and only if the number of true 
inputs is not a multiple of q. (Note that this controlled-M gate applies to k target qubits at once 
in an entangled way.) We can then apply an n-ary OR of these k qubits to the target qubit, i.e. 
a Toffoli gate with its inputs conjugated with X and its target qubit negated before or after the 
gate. We end by applying the inverse series of controUed-M''' gates to return the k work bits to 
|0). 



Now we use Proposition 4 of |18] to parallelize this set of controlled-M gates. We can convert 



them to diagonal gates by conjugating the k qubits with a unitary operator T, where T'^DT = M 
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and D is diagonal. If we have a parity gate, we can fan out the k work bits to n copies each using 
Proposition |3]l|. We can then simultaneously apply the n controlled-D gates from each input to 
the corresponding copy, and then uncopy them back. 

/o 1 
1 



This is shown in Figure |[ For g = 3, for instance, M = | ^ ^ ^ ] , T = ^ 



1/ 



I V3^ 
1 11 

47ri/3 g27ri/3 
yg27ri/3 g47ri/3 \ 



and D 




The operators T, T\ and the controlled-D gate can be carried out in some finite depth by 
controlled- nots and one-qubit gates by the results of 0]. The total depth of our MOD^ gate is a 
function of these and so of g, but not of n. Finally, the number of work bits used is (n — l)/c = 
0{n logg) as promised. q 

To look more closely at the depth as a function of g, we note that using the methods of Reck et 
al. p3[ and Barenco et al. 0, any operator on k qubits can be performed with 0{k^ 4*^) two-qubit 



gates. Since k = [log2 g] , this means that the depths of T, T"'" and the controlled-D gates are at 
most 0(g^ log^ g). 

Since we can construct MOD^ gates in constant depth, we have QACC'^[g] C QACC'^p] for all 



g, so QACC'^ = QACC'^[2]. By Proposition |3.1|, these are both also equal to QAC^f. In particular 



k 



we have 

QAC°f = QACC[2] = QACC 

while classically both equalities are strict inclusions. Note that allowing fanout immediately gives 
QACC„f[g] = QACC [2] = QACC for any g, but we will show below that including fanout explicitly 
is not necessary. 

We now show the converse QACC [2] C QACC[g] for any g, i.e. parity can be built from MOD^ 
for any g. This shows the constant depth equivalence of MODg gates for all g. 

Let g G N, g > 2 be fixed for the remainder of this section. Consider quantum states labeled 
by digits in D = {0, g — 1}. By analogy with "qubit," we refer to a state of the form, 

fc=0 

with J2k |cfcp = 1 as a "qudigit." 

We define three important operations on qudigits. The n-ary modular addition operator Mg 
acts as follows: 

Mg\xi, Xn, h) = \xi, ...Xn, (6 + Xi + ... + x„) mod g) 

We use the same graphical notation for Mg as we do for a MODg gate, but interpreting the lines 
as qudigits, as illustrated in Figure 

Since Mg merely permutes the states, it is clear that it is unitary. Similarly, the n-ary unitary 
base q fanout operator Fg acts as, 

Fg\xi, ...Xn, b) = |(xi + b) mod g, + b) mod g, b) 



We write F for F2, since it is the "standard" fan-out gate introduced in Definition 2.1. Note that 



Mg^ = M^-^ and Fg^ = F^-\ 
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Figure 5. Building a Modg gate. We choose a matrix M on k = \\0g2 q] qubits such that Af^ = 1, 
apply controlled-M gates from the n inputs to k work bits, apply an OR from these k qubits to 
the target qubit, and reverse the process to return the work bits to |0). To parallelize this, we can 
diagonalize M by writing it as DT, fan the k qubits out into n copies each using Proposition IsTT 



and apply controlled-D gates simultaneously from each input to a set of copies. The total depth 
depends on q but not on n. 
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Xi 



b [^] {b + Xi + ... + Xn) mod q 



Figure 6. An Mg gate. 



Finally, the Quantum Fourier Transform Hg (which generalizes the Hadamard transform H on 
qubits) acts on a single qudigit as, 

V y 6=0 

where ( = e i is a primitive complex q root of unity. It is easy to see that Hq is unitary, via 
the fact that E£d = iS a ^ mod q. 

The first observation is that, analogous to parity and fanout for Boolean inputs, the operators 
Mg and Fg are "conjugates" in the following sense. This is a generalization of the equivalence of 



assertions (2) and (3) of Proposition |3]1 



Proposition 4.2 Mg = {Hf^''+^^)-'^F-^Hf^''+^\ 

Proof. We apply the operators Hf^^~^^\ F~^, and in that order to the state 

\xi, ...,Xn, b), and check that the result has the same effect as Mg. 

The operator simply applies Hg to each of the n+l qudigits of \xi, which 

yields, 



q 2 yGD" a=0 

where y is a compact notation for yi, ...,yn, and x ■ y denotes J2i=i^iyi- Then applying F'^ to 
the above state yields, 

I 9-1 

-lE+W E EC'''^^"1(z/i-a)modg, ...,(?/„ -a) mod g, a). 

q 2 yg_D" a=0 

By a change of variable, the above can be re-written as, 

9-1 



c 

q 2 yeD" a=o 
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Finally, applying to the above undoes the Fourier transform and puts the coefficient 

of a in the exponent into the last slot of the state. 
The result is, 

^Hf(-+^))-^F;'Hf^^+'^\x,, x„, b) = x„, (6 + xi + ... + x„) mod q), 

which is exactly what Mg would yield. q 

We now describe how the operators Mg, Fg and Hg can be modified to operate on registers 
consisting of qubits rather than qudigits. Firstly, we encode each digit using [logg] bits. Thus, 
for example, when g = 3, the basis states |0), |1) and |2) are represented by the two-qubit registers 
|00), |01) and |10), respectively. Note that there remains one state (in the example, |11)) which 
does not correspond to any of the qudigits. In general, there will be 2'^^°^''^ — q such "non-qudigit" 
states. Mg, Fg and Hg can now be defined to act on qubit registers, as follows. Consider a state 
where x is a number represented as m bits (i.e., an m-qubit register). If m < [logg], then 
Hg leaves |x) unaffected. IfO<a;<g — 1 (where here we are identifying x with the number it 
represents), then Hg acts exactly as one expects, namely, Hg\x) = {^f \/q)J2l=o(^^\y)- If x > q, 
again Hg leaves \x) unchanged. Since the resulting transformation is a direct sum of unit matrices 
and matrices of the form of Hg as it was originally set down, the result is a unitary transformation. 
Mg and Fg can be defined to operate similarly on m-qubit registers for any m: Break up the m 
bits into blocks of [log q] bits. If m is not divisible by [log q] , then Mg and Fg do not affect 
the "remainder" block that contains fewer than [logg] bits. Likewise, in a quantum register 
|a;i, ...,Xn) where each of the Xj's (with the possible exception of Xn) are [loggl-bit numbers, Mg 
and Fg operate on the blocks of bits Xi, exactly as expected, except that there is no affect 
on the "non-qudigit" blocks (in which Xj > g) , or on the (possibly) one remainder block for which 
\xn\ < [logg]. Since Mg and Fg operate exactly as they did originally on blocks representing 
qudigits, and like unity for non-qudigit or remainder blocks, it is clear that they remain unitary. 

Henceforth, Mg, Fg, and Hg should be understood to act on qubit registers as described above. 
Nevertheless, it will usually be convenient to think of them as acting on qudigit registers consisting 
of [log g] qubits in each. 

Lemma 4.3 Fg and Mg are QAC^ -equivalent. 

Proof. By Barenco et al. 0, any fixed dimension unitary matrix can be computed in fixed 
depth using one-qubit gates and controlled nots. Hence Hg can be computed in QAC°, as can 
^®{n+i) rjj_^g result now follows immediately from Proposition [4.2| . q 



Lemma 4.4 MOD^ and Mg are QAC^ -equivalent. 

Proof. First note that -iMODg and MOD^ r are equivalent, since a MOD^^^ gate can be simulated 
by a -iMODg gate with g — r extra inputs set to the constant 1. Since -iMODg and MODg gates 
are equivalent, we can freely use MODg gates in place of MODg gates and vice versa. 

It is easy to see that, given an Mg gate, we can simulate a MOD^ gate. Applying Mg to n + 1 
digits (represented as bits, but each digit only taking on the values or 1) transforms, 

|xi, ...,Xn,0) ^ |xi, ...,x„, C^Xi) mod g). 
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Figure 7. A MODs^ circuit for r = 0. In thie figure, mod(x) denotes Mod3,,.(xi, Tfie 
notation on thie righit will be used as a shorthand for this circuit. 



Now send the bits of the last block Xi mod q) to an n-aij OR gate with control bit b (see the 
proof of Proposition |4.1|) . The resulting output is exactly b © Modg(xi, The bits in the 

last block can be erased by reversing the Mg gate. This leaves only xi, x„, 0{n) work bits, and 
the output b © Modg(xi, 

The converse (simulating Mq given MODg) requires some more work. The first step is to show 
that MODq o can also determine if a sum of digits is divisible by q. Let Xi, G -D be a set of 
digits represented as [logg] bits each. For each i, let xf'^ {0 < k < [logg] — 1) denote the bits of 
Xi- Since the numerical value of Xj is E[=o'^"'a;f^2^ it follows that 

1=1 k=0 i=l 

The idea is to express this last sum in terms of a set of Boolean inputs that are fed into a 
MODg gate. To account for the factors 2*^, each x[''^ is fanned out 2'^ times before plugging it 
into the MOD^ o gate. Since k < [logg], this requires only constant depth and C(n) work bits 
(which of course are set back to in the end by reversing the fanout). Thus, just using MODg^o 
and constant fanout, we can determine if J27=i Xi = mod q. More generally, we can determine if 
J2i=i^i = ^ mod q using just a MODg^^ gate and constant fanout. Let MODg^,.(xi, ...,x„) denote 
the resulting circuit, that determines if a sum of digits is congruent to r mod q. The construction 
of MODq ,.(3^1, Xn) is illustrated in Figure |^ for the case of g = 3. 

We can get the bits in the value of the sum J27=i mod g using MODg circuits. This is done, 
essentially, by implementing the relation x mod g = J2t=o ' Modg_r(a^)- For each r, < r < q — 1, 
we compute Modg^ri^i, ...,Xn) (where now the Xj's are digits). This can be done by applying 
the MODq circuits in series (for each r) to the same inputs, introducing a work bit for each 
application, as illustrated in Figure |[ 
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Figure 8. Applying MODg circuits in series 



Let rk denote the k bit of r. For each r and for each k, we take the AND of the output of 
the MODgr with (again by applying the AND's in series, which is still constant depth, but 
introduces q extra work inputs). Let a^. ,, denote the output of one of these AND's. For each k, 
we OR together all the a^^r's, that is, compute \/rZoCik,r, again introducing a constant number of 
work bits. Since only one of the r's will give a non-zero output from MODg ,,, this collection of 
OR gates outputs exactly the bits in the value of Y17=i ^« ^od q. Call the resulting circuit C, and 
the sum it outputs S. 

Finally, to simulate Mg, we need to include the input digit b E D. To do this, we apply a 
unitary transformation T to 15*, b) that transforms it to \S,{b + S) mod q). By Barenco, et al. |^ 
(as in the proof of Lemma ^I3| ), T can be computed in fixed depth using one-qubit gates and 
controlled NOT gates. Now using S and all the other work inputs, we reverse the computation of 
the circuit C, thus clearing the work inputs. This is illustrated in figure 



Xi 
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T 
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^-1 



Xi 
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[b + S) mod q 



Figure 9. Combining circuits to compute M„ 



The result is an output consisting of xi, Xn, 0{n) work bits, and (6 + Xi) mod g, which 
is the output of an Mq gate. q 

It is clear that we can fan out digits, and therefore bits, using an Fq gate (setting Xj = for 
1 <i <n fans out n copies of b). It is slightly less obvious (but still straightforward) that, given 
an Fq gate, we can fully simulate an F gate. 
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Lemma 4.5 For any q > 2, F and Fg are QAC -equivalent. 



Proof. By the preceding lemmas, Fg and MODg are QAC°-equivalent. By Proposition 
MODg is QAC°-reducible to F. Hence Fg is QAC°-reducible to F. 

Conversely, arrange each block of [log g] input bits to an Fg gate as follows. For the control-bit 
block (which contains the bit we want to fan out), set all but the last bit to zero, and call the 
last bit b. Set all bits in the i*^ input-bit block to 0. Now the i^^ output of the Fg circuit is b, 
represented as [logg] bits with only one possibly nonzero bit. Send this last output bit b and the 
input bit Xi to a controUed-NOT gate. The outputs of that gate are b and b © Xi. Now apply 
to the bits that were the outputs of the Fg gate (which are all left unchanged by the controlled- 
not's). This returns all the 6's to except for the control bit which is always unchanged. The 
outputs of the controlled- not 's give the desired b®Xi. Thus the resulting circuit simulates F with 
0{n) work bits. q 



Theorem 4.6 For any qe^, q^ I, QACC = QACC[g]. 



Proof. By the preceding lemmas, fanout of bits is equivalent to the MODg function. Thus we 
can do fanout, and hence MOD2, if we can do MOD^. By the result of Proposition |4.1| , we can 
do MODg if we can do fanout in constant depth. Hence QACC = QACC [2] C QACC[g]. q 

To compare these results with classical circuits requires a little care. For any Boolean function 
(p with n inputs and m outputs, we can define a reversible version 0' on n + m bits where 
(f)'[x,y) = {x,y Q) 4>{x)) keeps the input x and XORs the output 0(x) with y. Then if has a 
circuit with depth d and width w, it is easy to construct a reversible circuit for 0' of depth 2d—l 
where wd work bits start and end in the zero state. We do this by assigning a work bit to each 
gate in the original circuit, and replacing each gate with a reversible one that XORs that work 
bit with the output. Then we can erase the work bits by moving backward through the layers of 
the circuit. 

Then if we adopt the convention that a Boolean function with n inputs and m outputs is in a 
quantum circuit class if its reversible version is, we clearly have, for any k, AC^ C QAC^j and 
ACC'^ C QACC^ Thus we have 

AC° c ACC°[2] c ACC° C QAC°f = QACC[g] = QACC 

showing that QAC°f and QACC [2] are more powerful than AC*^ and ACC*^[2] respectively. 

Interestingly, if QAC° as we first defined it cannot do fanout, i.e. if QAC° C QAC°£, then in a 
sense it fails to include AC*^, since the fanout function from {0, 1} to {0, 1}" is trivially in AC°. 
However, it is not clear whether it fails to include any AC° functions with a one-bit output. On 
the other hand, if QAC° can do fanout, it can also do parity and is greater than AC", so either 
way AC" and QAC" are different. We are indebted to Pascal Tesson for pointing this out. 



5 Upper Bounds 

In this section, we prove the upper bounds results NQACCj^l,, C TC°, BQACC§^<^„,,, C TC°, 
NQACC;,7 C P/poly, and BQACCg^^^ C P/poly. 
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Suppose {Fn} and {zn} determine a language L in NQACC. Let F„ be the product of the layers 
Ui, . . . ,Ut and E be the distinct entries of the matrices used in the f/,'s. By our definition of 
QACC, the size of E is fixed with respect to n. We need a canonical way to write sums and 
products of elements in E to be able to check |(^|f/i ■ ■ • Ut\x, 0^^"^)^ > with a TC° function. To 
do this let A = {aj}i<i<m be a maximal algebraically independent subset of E. Let F = Q{A) 
and let B = {l3i}o<i<d be a basis for the field G generated by the elements in {E — A) U {1} over 
F. Since the size of the bases of F and G are less than the cardinality of E the size of these bases 
is also fixed with respect to n. 

As any sum or product of elements in E is in G, it suffices to come up with a canonical form 
for elements in G. Our representation is based on Yamakami and Yao [|31[. Let a G G. Since B 
is a basis, a = Y.'jZo ^jPj for some \j G F. We encode an a as a d-tup\e (we iterate the pairing 
function from the preliminaries to make (i-tuples) {^Xo\ . . . , '^A^-i^) where [Xj^ encodes Xj. As the 
elements of A are algebraically independent, each = Sj/uj where Sj and Uj are of the form 

m 

— * — * 

Here kj = {kij , . . . , kmj) G Z™, \kj\ is Y^ihj, aj:^ G Z, and e G N. In particular, any product 
Pm- Pi = Sj=d '^jPj with Xj = Sj/uj and Sj and Uj in this form. We take a common denominator 
u for elements of U {/3m • A} and not just E since the A/s associated with the Pm ■ Pi niight 
have additional factors in their denominators not in E. Also fix an e large enough to bound the 
\kj\^s which might appear in any element of or a product Pm ■ Pi- This e will be constant with 
respect to n. In multiplying t layers of QACC circuit against an input, the entries in the result 
will be polynomial sums and products of elements in E U {Pm ■ Pi} , so we can bound \kj\ for kj^s 
which appear in the Aj's of such an entry by e ■p{n). To complete our representation of o; G G we 
encode Xj as the sequence (r, {{dj:., kij, . . . , kmj))) where r is the power to which u is raised and 
((a^-, kij, . . . , kmj)) is the sequence of (a^- , kij, . . . , kmj)'s that appear in Sj. By our discussion, 
the encoding of an a that appears as an entry in the output after applying a QACC operator to 
the input is of polynomial length and so can be manipulated in TC°. 
We have need of the following lemma: 

Lemma 5.1 Let p be a polynomial. (1) Let f{i,x) G TC" output encodings of ai^^ ^ Then 
Z[yl] encodings o/ Z^fJi''' and HiHi'^c^i.x are TC° computable. (2) Let f{i,x) G TC° output 
encodings of ai^x ^ G. Then G encodings ofYfi=i^cii^x md nf=i Q'^e TC° computable. 

Proof. We will abuse notation in this proof and identify the encoding /(z, x) with its value ai^x- 
So X]j /(^) and Yli f{h will mean the encoding of J2i and Yii (^i,x respectively. 

(1) To do sums, the first thing we do is form the list LI = (/(O, x), . . . , x)). Then 
we create a fiattened list L2 from this with elements which are the (a^-* , kij, . . . , kmj)^s from the 

f{i,x)^s. LI is in TC° using our definition of sequence from the preliminaries, and closure under 
sums and maxi to find the length of the longest f{i,x). To fiatten LI we use maxi to find the 
length d of the longest f{i,x) for i < p{\x\). Then using max twice we can find the length of the 
longest (a^- , kij, . . . , kmj)- This will be the second coordinate in the pair used to define sequence 
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L2. We then do a sum of size d ■ p{\x\) over the subentries of LI to get the first coordinate 
of the pair used to define L2. Given L2, we make a hst L3 of the distinct fcj's that appear as 
(a^- , fcij, . . . , kmj) in some /(i, x) for some i <p{\x\). This hst can be made from L2 using sums, 
cond and /i. We sum over the t < length{L2) and check if there is some t' < t such that the t'th 
element of L2 has same kj as t and if not add the tth elements kj times 2 raised to the appropriate 
power. We know what power by computing the sum of the number of smaller t' that passed this 
test. Using cond and closure under sums we can compute in TC° a function which takes a list like 
L2 and a kj and returns the sum of all the clj^/s in this list. So using this function and the lists 
L2 and L3 we can compute the desired encoding. 

For products, since the a^'s of A are algebraically independent, TilA] is isomorphic to the 
polynomial ring Z[yi, . . . , ym] under the natural map which takes aj to yj. We view our encodings 
f{i,x) as m-variate polynomials in Z[yi, . . . ,ym]- We describe for any p' a circuit that works for 
any TC'' computable f{i,x) such that Y\if{i,x) is of degree less than p' viewed as an m-variate 
polynomial. In TC° we define g{i, x) to consist of the sequence of polynomially many integer values 
which result from evaluating the polynomial encoded by f{i,x) at the points (ii, . . . ,im) G N™ 
where < ig and I^s^s < p'- To compute f{i,x) at a point involves computing a polynomial sum 
of a polynomial product of integers, and so will be in TC°. Using closure under polynomial integer 
products we compute k{j,x) := Y[iP{j,9{hx)) where j3 is the sequence projection function from 
the preliminaries. Our choice of points is what is called by Chung and Yao [0 the p'-th order 
principal lattice of the m-simplex given by the origin and the points p' from the origin in each 
coordinate axis. By Theorems 1 and 4 of that paper (proved earlier by a harder argument in 
Nicolaides [^]) the multivariate Lagrange Interpolant of degree p' through the points k{j,x) is 
unique. This interpolant is of the form P{yi, . . . , ym) = J^jPjiVi: • • • ? ym)k{j, x) where the p/s 
are polynomials which do not depend on the function /. An explicit formula for these p^-'s is 
given in Corollary 2 of Chung and Yao |]^ as a polynomial product of linear factors. Since these 
polynomials are all of degree less than p', they have only polynomial in p' many coefficients and in 
PTIME these coefficients can be computed by iteratively multiplying the linear factors together. 
We can then hard code these p/s (since they don't depend on /) into our circuit and with these 
Pj^s, k{j,x), and closure under sums we can compute the polynomial of the desired product in 
TC°. 

(2) We do sums first. Assume f{i,x) := Y.'jZo ^ijPj- One immediate problem is that the Xij 
and Xi'j might use different m'^'s for their denominators. Since TC° is closed under poly-sized 
maximum, it can find the maximum value tq to which u is raised. Then it can define a function 
g{i,x) = J2'jZolijPj which encodes the same element of G as f{i,x) but where the denominators 
of the ji/s are now u^°. If Xj was Sj/u"^ we need to compute the encoding sj ■ u^^'"^ /u^°. This is 
straightforward from (1). Now 

PiM) Pi\^\) d-l Pi\x\) 

E fihx) = E 9(^1 X) = EK E S^,)/U^'^]/3J, 
1=1 i=l j=0 i=l 

where Si/s are the numerators of the 7jj's in g{i,x). From part (1) we can compute the encoding 
Cj of Sij) in TC°. So the desired answer ((rg, Cq), ■ ■ ■ , (rg, ea-i)) is in TC°. 

For products H^^i^^ f{i,x), we play the same trick as the in the Z[y4] product case. We view 
our encodings of elements of G as d-variate polynomials in F(?/o, • • • , Vd-i) under the map Pk goes 
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to Uk- (Note that this map is not necessarily an isomorphism.) We then create a function g{i,x) 
which consists of the sequence of values obtained by evaluating f{i, x) at polynomially many points 
in a lattice as in the first part of this lemma. Evaluating f{i,x) at a point can easily be done 
using the first part of this lemma. We then use part (1) of this lemma to compute the products 
k{j, x) = (3{j, g{i, x)). We then get the interpolant P{yo, ya-i) = Y.jPj{yo, ym)k{j^ We 
non-uniformly obtain the encoding of Pj{Po, ■ ■ ■ , Pd-i) expressed as an element of G. i.e., in the 
form J2^oXjwPw Thus, the product Ili=i^^ f{i,x) is 



d-l 



w=0 j 

The encoding of the products is the d-tuple given by {J2j \jok{j, 0), . . . , J2j ^jd-ikU, d—1)). Each 
of its components is a polynomial sum of a product of two things in F and can be computed using 
the first part of the lemma. q 

For {Fn} e QAClf = QACC, the vectors that F„ act on are elements of a 2"+^^") dimensional 
space Si^n+p{n) which is a tensor product of the 2-dimensional spaces £i, . . ■Sn+p(^n), which in turn 
are each spanned by |0), We write Ej^k for the subspace ®^^j£i of £i^n+p{n)- We now define a 
succinct way to represent a set of vectors in ^i,n+p(n) which is useful in our argument below. A 
tensor graph is a directed acyclic graph with one source node of indegree zero, one terminal node 
of outdegree zero, and two kinds of edges: horizontal edges, which are unlabeled, and vertical 
edges, which are labeled with a pair of amplitudes and a product of colors and anticolors (which 
are defined below). We require that all paths from the source to the terminal traverse the same 
number of vertical edges and that no vertex can have vertical edge indegree greater than one or 
outdegree greater than one. The height of a node in a tensor graph is the number of vertical 
edges traversed to get to it on any path from the source; the height of an edge is the height of its 
end node. The width of a tensor graph is maximum number of nodes of the same height. As an 
example of a tensor graph where our color product is the number 1, consider the following figure: 



{1} 0,1 



{1} 1/2,0 



o » 

T T 

T T 

,L-8 



{1} 1,0 



{1} 1/2,0 



The rough idea of tensor graphs is that paths through the graph correspond to collections of 
vectors in Si^n- For this particular figure the left path from the source node (s) to the terminal 
node (t) corresponds to the vectors given by 

|l)®(^|0) + i=|l))®i|0) 
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and the right hand path corresponds to 



|0)®(^|0) + ^|1))®^|0). 

A £j^]^-term in a tensor graph is a maximal induced tensor subgraph between a node of height 
j — 1 and a node of height k. If the horizontal indegree of the node at height j — 1 is zero and 
the horizontal outdegree of the node at height k is zero then we say the term is good. For the 
graph we considered above there are two good £^i_2-terms and two good £^2,3-terms but only one 
^1,3-term corresponding to the whole figure. 

"Colors" are used to handle controlled-not layers. A color c and its anticolor c are defined to 
obey the following multiplicative properties: c • c — c • c — 1 and c • c = 0. Given a color h and a 
product of colors c not involving h or its anticolor we require h ■ c — c-h and b ■ c — c ■ b. If a is a 
product of colors and anticolors not involving the color & or 6 and c is another product of colors we 
have a (6c) = {ab)c. Wc consider formal sums of products of complex numbers times colors. We 
require complex numbers to commute with colors and require colors and anticolors to distribute, 
i.e., if a, b, c are colors or anticolors then a- (6+c) = a-b + a-c and {b+c)-a = b-a + c-a. Finally, we 
require addition to work so that the above structure satisfies the axioms of an C-algebra. Given 
a tensor graph G denote this C-algebra by Ag. Since 



a) 



a ■ 



(a • a) 



this algebra is not associative. However, in the sums we will consider, the terms will never have 
more than two positions where a color or its anticolor can occur, so the products we will consider 
are associative. 

Using our our earlier encoding for the elements of C which could appear in a QACC computation, 
it is straightforward to use sequence coding to get a TC° encodings of the relevant elements of 
Ag. As an example of how colors affect amplitudes, consider the following picture: 



{b} 1, 




W 75' 73 



{b} 0,1 



The amplitude of |1, 0, 0) in the left hand dotted path isb-^-l-^-b-1 = 1/2 using commutativity 
and b^ = 1. Its amplitude in the right hand dotted path would be zero because of the last vertical 
edge. However, vectors such as |0, 0, 1) would have nonzero amplitude in the right hand dotted 
path. Nevertheless, the amplitude of any vector \x) in any path other than the dotted ones from 
s to t will be as 6 • 6 = 0. More formally, we define the amphtude of an \x) in a vertical edge as 
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equal to the left amplitude times the color product in the edge if \x) is |0) and equal to the right 
amplitude times the color product in the edge if is |1). The amplitude of a vector \xi, . . . ,Xj) 
in a path in a tensor graph is the product over k from 1 to j of the amplitude of the vectors l^^) in 
the vertical edge of height k. The amplitude of a vector \xj, . . . , Xk) in an f j^^-term is the sum of 
its amplitude in its paths. The amplitude of a vector |xi, . . . , Xp(^n)) in a tensor graph G is defined 
to be the sum of its amplitudes in G"s fi^p(„)-terms. 

As we will be interested in families of tensor graphs {(?„}, corresponding to our circuit families 
we want to look at those families with a certain degree of uniformity. We say a family of tensor 
graphs {Gn} is color consistent if : (1) the number of colors for edges of the same height is bounded 
by a constant k with respect to n, (2) the number of heights in which a given color/anticolor can 
appear is exactly two (colors and their anticolors must appear on the same heights), (3) each color 
product at the same height is of the form Y[i=o h where U must be either a color q or q (it follows 
there are 2^^ possible color products for edges at a given height). We say that a color/anticolor 
is active at a given height if the height is at or after the first height at which the color/anticolor 
occurs and is below the height of its second occurrence. The family is further said to be log-color 
depth if the number of active colors/anticolors of a given height is log-bounded. 

Theorem 5.1 Let {Fn} be a family of QACC operators and let {{zn\} a family of observables. (1) 
There is a color- consistent family of tensor graphs of width 2^^* and polynomial size representing 
the output amplitudes of Ui ■ ■ ■ Ut\zn) where Ui are the layers of F^. (2) // {F„} is in QACC^"^ 
then the family of tensor graphs will be of log-color depth. (3) If {Fn} is in QACCg°^jgg then the 
number of paths from the source to the terminal node is polynomially bounded. 



Proof. The proof is by induction on t. In the base case, t = 0, we do not multiply any layers, and 
we can easily represent this as a tensor graph of width 1. Assume for j < t that C/j • • • C/i|x, CF^")) 
can be written as color consistent tensor graph of width 2^^* and polynomial size. There are two 
cases to consider: In the first case the layer is a tensor product of matrices Mi ® ■ ■ ■ ® M,^ where 
the Mfc's are Toffoli gates, one qubit gates, or fan-out gates (since QAC° ^ = QACC); in the second 
case the layer is a controUed-not layer. 

For the first case we "multiply" Ut against our current graph by "multiplying" each Mj in 

parallel against the terms in our sum corresponding to M,'s domain, say Sj' If Mj = [ ) 

\Ul0 Mil / 

with domain Sj' is a one-qubit gate, then we multiply the two amplitudes in each vertical edge 
of height / in our tensor graph by Mj. This does not effect the width, size, or number of paths 
through the graph. If Mj is a Toffoli gate, then for each good term S in Sji^k/ in our tensor graph 
we add one new term to the resulting graph. This term is added by adding a horizontal edge going 
out from the source node of S followed by the new £^j/^jt/-term followed by a horizontal edge into 
the terminal node of S. The new term is obtained from S by setting to the left hand amplitudes 
of all edges in S of height between j' and k' — 1 and then if a, 7 is the amplitude of an edge of 
height k' in the new term we change it to 7 — a, a — 7. This new term adjusts the amplitude 
for the case of a vector in Sj'^k'-i tensored with either a |0) or |1). This operation 

increases the width of the new tensor graph by the width of the good £j/,fc'-term for each good 
£j/ fc/-term in the graph. Since the original graph has width 2^^'' ^ there are at most this many 
starting and ending vertices for such terms. So there at most (2^^'* ^')^ such terms. Each of these 



23 



terms has width at most 2' 



,22(*-l) 



. Thus, the new width is at most 



2 



,22(*-l) 



+ (2' 



,22(*-i)n2 



f-2 



< 2 



Notice this action adds one new path through the Sj'^k' pai't of the graph for every existing one. 

Now suppose Mj is a fan-out gate, let 5* be a good Sj/^ki-teim in our tensor graph and let e be 
any vertical edge in S in Sk'- Suppose e has amplitude a for |0) and amplitude 7 for |1). In the 
new graph we change the amplitude of e to a, 0. We then add a horizontal edge out of the source 
node of S followed by a new f^j/ fc/-term followed by a horizontal edge into the terminal node of S. 
The new term is obtained from S by changing the amplitude for edges in Sk> with amplitudes a, 7 
in S to 0, 7. The amplitudes of the non-Sk' edges in this term are the reverse of the corresponding 
edge in S, i.e., if the edge in S had amplitude S, ( then the new term edge would have amplitude 
C, S. The same argument as in the Toffoh case shows the new width is bounded by 2^ and that 
this action adds one new path through the £j',k' psivt of the graph for every existing one. 

For the case of a controlled- not layer, suppose wc have a controlled- not going from line i onto 
line j. Let c, c be a new color, anti-color pair not yet appearing in the graph. Let be a vertical 
edge of height i in the graph and let Cj, a^, 7i be respectively its color product and two amplitudes. 
Similarly, let ej be a vertical edge of height j in the graph and Cj, aj, 7^ be its color product and 
two amplitudes. In the new graph we multiply c times the color product of e, and e.j and change 
the amplitude of Cj to aj, 0. We then add a horizontal edge going out from the starting node of e^, 
followed by a vertical edge with values Q • c, 0, 7^ followed by a horizontal edge into the terminal 
node of e^. In turn, we add a horizontal edge going out of the starting node of ej, followed by 
a vertical edge with values Cj ■ c,^i, aj followed by a horizontal edge into the terminal node of 
Cj. We handle all other controlled gates in this layer in a similar fashion (recall they must go 
to disjoint lines). Wc add at most a new vertex of a given height for every existing vertex of a 
given height. So the total width is at most doubled by this operation and 2-2^^* < 2^^*. In 
the QACC7 case, simulating a layer which is a Kronecker product of spaced controUed-not gates 
and identity matrices, notice we would at most add one to the color depth at any place. So if a 
controlled-not layer is a composition of O(log) many such layers it will increase the color depth 
by 0{\og). In the QACC|^°jf(g^ case, notice that simulating a single controlled-not we add one new 
path for each existing path through the graph at each of the two heights affected. This gives three 
new paths on the whole subspace for each old one. 

Since we have handled the two possible layer cases and the changes we needed to make only 
increase the resulting tensor graph polynomially, we thus have established the induction step and 
(1) and (2) of the theorem. For (3), observe for each multi-line gate we handle in adding a layer we 
at most quadruple the number of paths through the subspace where that gate applies. Since there 
are at most logarithmically many such gates, the number of paths through the graph increases 
polynomially. q 

Theorem 5.2 Let {Gn} be a family of constant width color- consistent tensor graphs of vectors 
in £i,p{n)- Assume the coefficients of amplitudes in the {Gn} can be encoded in TC° using our 
encoding scheme described earlier and that {Gn} has log-color depth. Then the amplitude of any 
basis vector of Si^p^n) in Gn is P/poly computable. If the number of paths through the graph from 
the source to the terminal node is polynomially bounded then the amplitude of any basis vector is 
TOP computable. 
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Proof. Let Gn be a particular graph in the family and let be the vector whose amplitude 
we want to compute. Assume that all graphs in our family have fewer than k colors in any color 
product and have a width bounded by w. We will proceed from the source to the terminal node 
one height at a time to compute the amplitude. Since the width is w the number of £^i-terms is at 
most w and each of these must have width at most w. Let ai^i, . . . , ai^w (some of which may be 
zero) denote the amplitudes in Ag„ of \xn,i) in each of these terms. The ai^j are each sums of at 
most w amplitudes times the color products of at most k colors and anticolors, so the encoding of 
these w amplitudes is TC*^ computable. Because of the restriction on the width of Gn there are at 
most w many £^i j-terms, w"^ many Sj j^i-terms, and w many j+i-terms. Fixing some ordering 
on the nodes of height j and j + 1 let 'jj^i^k be the amplitude of \xnj+i) in the Sjj+i-teim with 
source the ith node of height j and with terminal node the fcth node of height j + The amplitude 
is zero if there is no such Sj j+i-teim. Then the amplitudes . . . , dj+i^w of the j+i-terms 

can be computed from the amplitudes a^^i, . . . , aj^^ of the £^ij-terms using the formula 

i=l 

Thus aj+i,k can be computed from the aj^i using a polynomial sized circuit to do these adds and 
multiplies. Similarly, each ^ can be computed by polynomial sized circuits from the aj^i^^s 
and so on. Since we have log-color depth the number of terms consisting of elements in our field 
times color products in a aj^k will be polynomial. So the size of the aj^kS j < p{n), k < w will be 
polynomial in the input Xn- So the size of the circuits for each aj^k where j < p{n) and k < w will 
be polynomial size. There is only one £^i^p(„)-term in Gn and its amplitude is that of so this 
shows it has polynomial sized circuits. For the TC° result, if the number of paths is polynomially 
bounded, then the amplitude can be written as the polynomial sum of the amplitudes in each path. 
The amphtude in a path can in turn be calculated as a polynomial product of the amplitudes 
times the colors on the vertical edges in the path. Our condition on every color appearing at 
exactly two heights guarantees the color product along the whole path will be 1 or 0, and will be 
zero iff we get a color and its anticolor on the path. This is straightforward to check in TC°, so 



this sum of products can thus be computed in TC using Lemma |5.1| . q 



Corollary 5.3 

(1) EQACCp7 C NQACCpf C P/poly, and BQACCq^^ C P/poly. 

(2) EQACCX ^ NQACCX ^ TC°, and BQACCg^„,,, C TC°. 

Proof. Given a a family {F„} of QACCp°^ operators and a family {(inl) of states we can use 
Theorem to get a family {Gn} of log color depth, color-consistent tensor graphs representing 
the amplitudes of F~^\zn). Note {F~^} is also a family of QACCj,"^ operators since Toffoli and 
fan-out gates are their own inverses, the inverse of any one qubit gate is also a one qubit gate 



(albeit usually a different one), and finally a controlled-not layer is its own inverse. Theorem |5^ 
shows there is a P/poly circuit computing the amplitude of any vector \xn) in this graph. This 
amounts to calculating 

(^Xn\F^ IZn) ( I -^n I -^n ) • 
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If this is nonzero, then > 0, and we know x is in the language. In the BQACCq case 

everything is a rational so P/poly can explicitly compute the magnitude of the amplitude and 
check if it is greater than 3/4. The TC° result follows similarly from the TC° part of Theorem 

□ 

Finally, we note that some of the inclusions in the previous corollary can be strengthened if we 
assume that the circuit families are polynomial-time uniform and their coefficients polynomial- 
time computable. In particular p-uniform NQACCj,"^ is contained in P and p-uniform NQACC^^^^g^ 
is contained in p-uniform TC". 



6 Discussion and Open Problems 

A number of open questions are suggested by our work. 

• Is QAC° = QAC°f? That is, can the fanout gate be constructed in constant depth when 
each qubit can only act as an input to one gate in each layer? 

• Is QAC°f = QTC°? That is, can the techniques used here be extended to construct quantum 
threshold gates in constant depth? 

• Is all of NQACC in TC° or even P/poly? We conjecture that NQACC is in TC°. As men- 
tioned in the introduction, we have developed techniques that remove some of the important 
obstacles to proving this. 

• Are there any natural problems in NQACC that are not known to be in ACC? 

• What exactly is the complexity of the languages in EQACC, NQACC and BQACCq? We en- 
tertain two extreme possibilities. Recall that the class ACC can be computed by quasipoly- 



nomial size depth 3 threshold circuits It would be quite remarkable if EQACC could also 
be simulated in that manner. However, it is far from clear if any of the techniques used in 
the simulations of ACC (the Valiant- Vazirani lemma, composition of low-degree polynomials, 
modulus amplification via the Toda polynomials, etc.), which seem to be inherently irre- 
versible, can be applied in the quantum setting. At the other extreme, it would be equally 
remarkable if NQACC and NQTC° (or BQACCq and NQTC°) coincide. Unfortunately, an 
optimal characterization of QACC language classes anywhere between those two extremes 
would probably require new (and probably difficult) proof techniques. 

• How hard are the fixed levels of QACC? While lower bounds for QACC itself seem impossible 
at present, it might be fruitful to study the limitations of small depth QACC circuits (depth 
2, for example). 

Acknowledgments: We thank Bill Gasarch for helpful comments and suggestions. 
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